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Department  of  Electrical  Engineering 
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Charlottesville,  VA  22901 


Abstract 


We  present  and  discuss  a  class  of  continuous  opera¬ 
tions  on  the  family  of  discrete  time  stochastic  processes, 
which  serves  as  a  guide  to  construct  qualitatively  robust 
operations  for  a  given  class  of  processes,  namely  the  one 
induced  by  a  nominal  process  and  a  substitutive  contam¬ 
inating  process.  Our  results  are  general  enough  to  help 
develop  any  robust  statistical  procedure,  but  we  have  con¬ 
centrated  our  attention  on  detection  of  a  change  from  one 
class  of  processes  to  another  (disjoint)  class  of  processes, 
while  both  classes  consist  of  not  necessarily  Markov 
processes  and  satisfy  certain  mixing  conditions  in  addition 
to  stationarity  and  ergodicity.  Two  quantitative  measures 
of  robustness,  breakdown  point  and  influence  functions  are 
also  developed  for  few  examples. 


0.  Introduction 

Consider  two  stationary  and  ergodic  processes 
[Po.Xo.R]  and  [pi,X2,R],  where  Po  and  p,  are  the  two 
distinct  probability  measures  on  (R“,  B_)  or  (RZ_,  B!L) 
as  the  case  may  be,  X{  and  X2  are  their  names  and  R  is  the 
real  line  on  which  both  processes  take  their  values.  As 
usual  (R~,  B„)  and  (RZ_,  B!L)  denote  respectively  the 
one  sided  and  two  sided  infinite  product  of  real  line  with 
itself  with  their  corresponding  product  a-algebras.  Let 
W{,j2:i  denote  an  observed  sequence.  Suppose  we  start 
observing  at  time  instant  one  and  suppose  initially  the  pro¬ 
cess  Po  is  active.  Suppose  at  some  time  instant  t2sl,  pro¬ 
cess  Po  becomes  inactive  and  P[  becomes  active  and 
remains  so.  Our  objective  is  to  formulate  a  meaningful  test 
to  detect  this  shift  from  po  to  pt .  To  attain  this  objective  a 
number  of  algorithms  have  been  proposed,  developed  and 
studied  in  literature.  Most  widely  studied  ones  are  the 
Page’s  algorithm,  Page  (1954]  and  Shiryayev-Roberts’ 
algorithm,  Shiryayev  (1963)  and  Roberts  (1966).  Page 
developed  the  algorithm  under  i.i.d.  set  up  and  Lorden 
(1971)  studies  it  and  proved  its  asymptotic  optimality  using 
a  minimax  criteria.  Bansal  and  Papantoni-Kazakos  (1986) 
modified  it  and  proved  it’s  optimality  under  non  i.i.d. 
setup. 

But  suppose  our  description  of  the  two  measures 
under  consideration  is  imperfect  or  perhaps  our  observa¬ 
tions  are  vulnerable  to  contamination  by  another  unknown 
measure  then  we  need  to  develop  robust/outlier  resistant 
algorithms  in  order  to  achieve  relatively  stable  perfor¬ 
mance  possibly  by  sacrificing  the  efficiency  of  the  algo¬ 
rithm  that  is  achievable  at  the  ideal  model  (i.e.  in  the 
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absence  of  any  deviation  whatsoever  from  the  assumed 
structure).  Before  we  proceed  further,  however,  let  us  state 
our  generalized  observation  model  in  a  precise  manner. 

Let  Yj  be  the  observation  taken  at  time  instant  j  where 
Yj  =  (l-AjjWj  +  Aj  Zj,  j>l .  (1) 

Here  Wj  is  governed  by  the  nominal  measure  po  or  pt  and 
(Zj)  is  an  i.i.d.  process  which  is  arbitrary  and  { Aj }  is  a 
binary  i.i.d.  process  with 

Pr(Aj  =  1}  =  1  — Pr { Aj  =0)  =e  . 

Essentially  each  nominal  random  variable  Wj  is  replaced 
by  an  arbitrary  random  variable  Zj  with  frequency  (proba¬ 
bility)  e  before  we  get  to  observe  it  (Zj }  is  the  contaminat¬ 
ing  process  and  ( Aj )  determines  the  contamination  law.  In 
the  absence  of  contaminating  process  (e=0),  we  have  perect 
observations  and  then  we  can  apply  our  optimal  algorithm 
Bansal  et  al  (1986).  However,  under  contamination,  the 
optimal  algorithm  becomes  totally  unreliable  in  the  sense 
that  just  a  single  bad  observation  can  overwhelm  the  evi¬ 
dence  provided  by  other  good  observations  and  upset  the 
decision.  This  will  become  apparent  the  moment  we  see 
the  optimal  test,  which  is  defined  as  follows. 

Given  the  (uncontaminated)  data  sequence 
w=(wt,  w2,...)  and  letting  w"  denote  the  finite  sequence 
(wj,  w2,...w„)  stop  at 

Ng(w)  =  inf(n:  T®(wf)^log5)  (2) 

where  log  6  is  the  logarithmic  threshold  chosen  and 

T0,  nNA  f"  .  fifwjlwi'1) 

T„(w?)  max  £  log  - —  (3) 

isksiw-i  foiwjlw^1) 

is  the  test  statistic  with  appropriate  end  conditions  and 
fj(V-) ;  i=0, 1  denote  the  conditional  densities  of  p*  with 
respect  to  an  appropriate  o-finite  measure,  whose  existence 
we  assume.  Sec  Bansal  et  al  (1986)  for  other  necessary 
regularity  conditions  and  complete  details.  Notice  from  the 
expression  of  T^w^,  a  single  term  inside  the  summation 
can  make  T„(w")  too  large  or  too  small  if  pi  and  po  do 
not  have  compact  support.  And  therefore  Nf(w)  can  be  too 
small  or  too  large  and  in  essence  the  test  may  became 
unreliable.  This  is  invariably  the  problem  with  all  the  clas¬ 
sical  parametric  tests  or  estimators,  many  of  which  are 

optimal  in  an  appropriate  sense.  Lately  (from  the  last 
twenty  five  years  or  so)  we  have  become  "more"  aware  of 
our  inability  to  model  a  phenomena  accurately  and  the  vul¬ 
nerability  of  our  observations  to  gross  errors  and  have 
focused  our  attention  to  the  development  of  robust  pro¬ 
cedures  by  sacrificing  efficiency  or  the  optimality.  Natur¬ 
ally  one  would  like  to  quantify  robustness  in  order  to  evalu¬ 
ate  the  tradeoff,  which  opens  the  new  area  of  optimal 
robust  procedures.  However  often  because  of  the  complex¬ 
ity  of  the  observation  model  it  becomes  (or  at  least  appears 
to  be)  impossible  to  design  optimal  robust  procedures.  In 
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our  problem  at  hand  we  perceive  this  handicap.  Therefore, 
we  have  attempted  to  look  for  intuitively  meaningful  pro¬ 
cedures  and  examine  their  performance  in  terms  of  effi¬ 
ciency  (or  loss  thereof)  and  the  breakdown  point  and  the 
influence  function.  The  structure  of  the  optimal  algorithm 
is  used  as  the  starting  point  and  the  guide  to  the  develop¬ 
ment  of  such  procedures.  Before  we  discuss  our  approach 
it  is  important  to  consider  the  special  cases  of  the  observa¬ 
tion  model  in  (1)  and  what  procedures  have  already  been 
studied  in  literature.  This  will  serve  two  purposes.  One, 
offer  insight  into  what  we  could  reasonably  expect  from 
our  robust  algorithms  for  the  general  case  and  two,  the 
inapplicability  of  the  existing  robust  algorithms  under  those 
special  cases  which  we  are  about  to  discuss. 

Note  from  (1),  that  if  all  three  component  processes 
(Wj),  {Aj}and{Zj)  are  i.i.d.  then  the  process  { Yj )  is 
i.i.d.  and  then  it  suffices  to  develop  procedures  that  are 
based  on  one-dimensional  marginal  distribution  alone.  In 
fact  then,  { Yj )  could  have  this  alternative  description 

fyj  (y)  =  (l-e)fw,  (y)  +  eh(y)  (4) 

where  h(y)  is  the  density  function  of  Zj.  Also  (Wj }  being 
i.i.d.  under  Hi  and  po  both,  means 

.  f"  fl(Wi)l 


T"(w?)=  max 


£'°gfoW 


Under  these  stricter  conditions  on  tne  process  ( Yj )  it  seems 
natural  to  replace  the  pair  (fj(Wj),  fo(wi))  by  a  least  favor¬ 
able  pair  (qi(w;),  q0(Wj))  where  qi(-),  q0(  )  minimize  the 
Kullback  distance.  See  Huber  (1965)  for  details.  Notice 
that  (4)  and  (1)  both  describe  two  distinct  classes  of 
processes  obtained  by  two  nominal  measures  p!  and  po- 

Let  us  denote  the  new  test  statistic  by  T„(w")  and  the 
resulting  stopping  varriable  by  Nj(w),  where 
•  ' 

,  -  n  qi(Wj) 

T^(wJ)=  max  2  '°g  •  <«> 

isksn+1  [i=k  qo(Wj) 

Notice  that  Nj(w)  is  the  stopping  variable  resulting  from 
Page’s  test,  applied  to  the  least  favorable  pair  of  (i.i.d.) 
processes,  which  detects  the  change  from  p^  to  p^  in  an 
optimal  manner  as  shown  by  Lorden  (1971).  And  since 
this  test  resulted  from  the  minimax  robustification  of  the 
test  for  po  to  p(  shift,  it  becomes  the  optimal  minimax 
robust  test,  which  we  prefer  to  call  optimal  in  super 
minimax  sense.  This  was  quite  straightforward  because  of 
the  i.i.d.  structure  of  all  the  component  processes,  which  in 
turn  induced  an  i.i.d.  structure  on  the  observation  process. 

But  suppose  that  our  nominal  measures  are  not  i.i.d., 
but  they  are  Markovian  (Note  that  the  observation  process 
is  no  more  Markovian  even  though  the  component 
processes  are).  Then  again  one  is  tempted  to  robustify 
T°(w?)  in  (3)  by  applying  suitable  transformation  on 
fifwi/wf1) 

- rr— .  In  order  to  apply  the  approach  used  for  i.i.d. 

foK/wf1) 

case  we  need  to  obtain  two  classes  of  conditional  densities 
similar  to  the  ones  given  in  (4).  It  turns  out  that  it  is  impos¬ 
sible  to  obtain  an  exact  description  of  the  model  in  (1)  in 
the  mixture  form  given  in  (4).  Note  that  (4)  and  (1)  are 
equivalent  under  i.i.cL  setup,  but  otherwise  (4)  is  a  strict 
enlargement  of  (1).  But  (4)  by  itself  does  not  enable  us  to 
obtain  suitable  replacement  of  the  conditional  log  likeli¬ 
hood  ratios.  To  overcome  this  problem  we  have  used  two 
approaches,  one  leading  to  approximate  description  of  (1) 


in  mixture  form,  based  on  nominal  conditional  densities  of 
the  two  measures  pt  and  po  and  the  other  being  a  strict 
enlargement  of  (1),  using  the  variational  metric  and  an 
additional  assumption  of  ^mixing  on  the  nominal  meas¬ 
ures.  Then  Huber’s  operations  were  applied  and  the 
corresponding  pairs  of  least  favorable  conditional  densities 
were  used  to  obtain  a  test  stastic,  which  we  denote  by 
T„(w")  here.  These  results  were  developed  and  reported 
in  Bansal  and  Papantoni-Kazakos  (1987a)  in  detail  and  in 
Bansal  and  Papantoni-Kazakos  (1987b)  in  pan  and  in  con¬ 
densed  form.  Under  the  Markovian  assumption  of  the 
nominal  measures  p1andpo,  the  conditional  likelihood 
ratios  in  (3)  have  finite  memory.  Huber’s  operation 
induces  uniformly  bounded  conditional  likelihood  ratios, 
where  the  lower  and  upper  bounds  both  depend  on  the  fin¬ 
ite  number  of  ‘past’  observations.  Moreover,  if  the  nomi¬ 
nal  conditional  densities  are  continuous  functions  of  the 
observation  block,  then  the  modified  densities  are  also  con¬ 
tinuous.  Boundedness  of  each  term  under  the  summation, 
it’s  continuity  as  a  point  function  and  its  dependence  on 
only  finite  number  of  variables  suffices  to  ensure  the  quali¬ 
tative  robustness  of  the  test  statistic  T„  (w"),  the 
corresponding  stopping  variable  and  the  sequence  of  func¬ 
tionals  (n-1  Tj,'(w])).  Here  (  }  denotes  the 
expected  value  under  the  measure  m .  Readers  are  referred 
to  Boente  et  al  (1987)  and  Papantoni-Kazakos  (1987)  for 
extensive  discussion  of  qualitative  robustness  for  stochastic 
process.  However  the  moment  we  relax  the  Markovian 
assumption,  our  nominal  conditional  densities  depend  on 
the  entire  past  which  grows  to  infinity  as  sample  size  goes 
to  infinity.  Then  the  quantities  under  the  summation  will 
have  unbounded  number  of  arguments  and  the  functional 
lim  (Tn(wi)/n_1 )  will  depend  on  the  entire  process 

and  not  just  on  finite  order  marginals  of  the  process.  Under 
these  circumstances,  boundedness  of  n-1Tn'(w")  is  not 
enough  to  guarantee  qualitative  robustness  in  general. 
Counter  examples  to  illustrate  this  phenomenon  in  case  of 
estimation  of  the  parameters  of  a  moving  average  process 
are  given  in  Martin  and  Yohai  (1986)  and  Boente  et  al 
(1987).  The  phenomena  is  explained  in  different  ways  in 
the  above  two  works  but  we  will  provide  our  own  explana¬ 
tions  toward  the  end  of  this  section. 

However,  the  robustness  of  the  ‘Huberized’  condi¬ 
tional  log  likelihood  ratios  used  in  T„  (w")  can  be  restored 
if  we  limit  the  memory  being  used  in  the  likelihood  ratios 
in  an  artifical  manner.  But  we  need  to  make  a  judicious 
choice  of  that  because  too  large  a  memory  will  result  in 
weaker  robustness  (as  measured  through  breakdown  point 
for  example)  and  higher  efficiency  and  too  small  a  memory 
would  result  in  stronger  robustness  and  lower  efficiency. 
This  approach  will  suffer  from  another  weakness,  that  is 
the  resulting  algorithm  will  not  reduce  to  the  optimal  one 
unless  we  let  the  size  of  the  memory  being  used,  depend  on 
the  design  parameter  e  such  that  it  goes  to  infinity  as  £  1  0. 
We  intend  to  report  the  work  on  this  issue  elsewhere.  The 
results  we  are  going  to  report  in  this  paper  are  obtained  by 
using  an  alternative  approach.  But  before  we  discuss  these, 
it  is  profitable  to  discuss  some  of  the  limiting  properties  of 
the  algorithm  based  on  T„  (w"). 


By  definition. 


Tn(w")  _  may 
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For  convenience,  we  had  proposed  the  minor  modification 
of  T„  (w?)  in  Bansal  et  al  (1987  a,  1987b) 


•C(w?)=  max 

lSfcSn+1 


qilWj/w'f1) 
qo(Wj/w‘f 1 ) 


=  max 

tSkSn+l 


igr  (w\) 

>k 


(8) 


where 


gi(w\)H 


Ci.eiwr1) ;  ln  ,fl(w'/wl  2  s  Cl  £  (w\  ') 


fo(Wi/wr‘) 

flfWi/wf1)  •  , 

ln - — —  ;  Ci  £(w‘f1)<  In 

foCwi/wf1)  U 


fl(w,/w\  ') 


f0(Wi/Wl  ') 


<Co.£  (wf1) 


.  .  fi(W;/wi  ) 

c0.e(wi_1) ;  In 77 - ~77  -  c0.e(w!  ) 

f0(wi/wr‘) 


(9) 


and  Ci,£(w‘f 1 )  and  co.efwf1 )  are  the  lower  and  upper  thres¬ 
holds  determined  by  applying  Huber’s  operation.  These 
thresholds  depend  on  wf1  at  stage  i,  but  are  uniformly 

1— £  1—6 

bounded  below  and  above  by  -  ln - and  ln - respec- 

e  e 

tively. 


Compare  T'n,  T„  and  T„"  in  (6),  (7)  and  (8)  respec¬ 
tively.  Note  that  if  the  observation  process  is  i.i.d.  T„  and 
T„"  both  reduce  to  T„  which  was  seen  to  be  optimal  in 
super  minimax  sense.  Next,  as  elO,  our  two  classes  of 
processes  described  in  (1)  reduce  to  a  single  pair  (po.Hi) 
and  then  TJj,'(w" )  reduces  to  T^(w")  which  has  been  proved 
to  be  optimal.  Seeing  these  two  features  of  our  proposed 
algorithm  in  the  two  extreme  (or  limiting)  cases,  we  hope 
to  have  achieved  good  robustness  without  sacrificing  a  lot 
of  efficiency.  Therefore,  in  whatever  we  suggest  and  study 
next,  we  would  like  to  retain  these  two  attractive  features. 
This  is  used  as  the  basic  guideline  to  develop  robust  algo¬ 
rithms  under  trie  most  general  set  of  assumptions  on  the 
nominal  measures. 


Here  then  is  the  brief  outline  of  the  rest  of  the  paper. 
In  section  I,  we  discuss  the  nature  of  qualitative  robustness 
for  stochastic  processes  from  function  analytic  approach, 
describe  a  convenient  class  of  qualitatively  robust  opera¬ 
tions  and  provide  a  simple  counter  example  to  illustrate 
that  boundedness  and  pointwise,  coordinatewise  continuity 
is  not  enough  to  ensure  robustness  in  general.  Then  in  sec¬ 
tion  II,  a  simple  approach  is  described  to  obtain  meaningful 
robust  substitutes  for  the  nominal  likelihood  ratio,  which 
works  for  a  pair  of  linear  processes  as  nominals.  In  section 
III,  we  investigate  the  approach  in  detail  on  an  example  of 
first  order  moving  average  processes  and  evaluate  the  effi¬ 
ciency,  the  breakdown  point  and  a  version  of  influence 
function.  Finally  in  conclusion,  other  interesting  possibili¬ 
ties  for  further  research  are  pointed  out. 


I.  Qualitative  Robustness  for  Stochastic  Processes 

Consider  the  space  of  measures  M  defined  on 
(R“,  B_).  Let  w  =  (W| denote  a  realization  of  the  pro¬ 
cess.  We  are  interested  in  robust  (continuous)  functionals 


on  the  space  M  and  in  particular  continuous  linear  function¬ 
als  on  M,  because  most  test  statistics  or  estimators  induce  a 
linear  operation  on  M  itself  in  general  or  in  special  cases  on 
the  restrictions  of  M  to  an  n  dimensional  Euclidean  space, 
the  later  denoted  by  Mn.  To  ensure  robustness,  our  choice 
centers  then  on  the  functionals  which  are  continuous  with 
respect  to  the  weak  star  topology  on  M  or  its  restriction  Mn 
as  the  case  may  be.  For  example,  when  we  form  our  atten¬ 
tion  only  on  those  members  of  M  which  generate  i.i.d. 
processes,  then  our  statistical  operations  induce  functionals 
(often  linear)  on  Mi  and  for  a  class  of  finite  order  (m  say) 
Markov  process  class,  they  induce  functionals  on  Mm 
alone.  Robustness  of  operations  in  these  two  cases  is  suffi¬ 
ciently  well  understood.  Regarding  the  justification  for  the 
choice  of  weak  star  topology  on  M„ ,  one  can  see  Hampel 
(1971),  Boente  et  al  (1987)  etc.  A  weak  star  neighborhood 
(since  it’s  metrizable  by  Prohorov  metric)  captures  both  the 
kinds  of  deviations  from  the  nominal,  namely  small 
number  of  gross  errors  (outliers)  and  small  error  in  large 
number  of  observations.  And  weak  star  topology  on  M 
itself  will  induce  weak  star  topology  on  Mn.  Knowing  the 
fact  that  weak  star  topology  can  be  generated  by  the 
Prohorov  metric,  it  is  important  in  this  case  to  have  a 
proper  choice  of  the  distortion  measure  itself  on  the  data 
sequences.  In  fact,  the  usual  Euclidean  metric  suffices  for 
Rn,  but  there  is  no  equivalent  of  that  on  R”.  There  are 
three  natural  ways  however.  One,  the  so  called  uniform 

metric 

-  ,  n 

pu(x,y)  =  max  p(Xj,  y;) ;  p  =  lim  n  £  p(Xj,  y;) 
i  n— j 

where  x  =  (x1(..),  y  =  (ylt..)e  R“  and  p  is  a  bounded 
metric,  generating  the  usual  topology  on  Rj.  Incidently 
pu  and  p  both  are  so  strong  that  they  make  our  usually  esti¬ 
mators  like  mean  robust.  It’s  because  they  induce  a  strong 
topology  on  R”  which  in  turn  induces  strong  topology  on 
M.  In  fact,  the  right  choice  turns  out  to  be  product  topol¬ 
ogy  which  is  weak  enough  to  correspond  to  our  desired 
notion  of  robustness.  And  it  can  be  induced  by  a  metric  of 
the  following  form 

,  ,  “  PKyi) 

<10> 

where  oq>0  and  £oq  converges,  p  is  a  metric  which 

t 

induces  the  usual  topology  on  Ri . 

The  above  justification  leads  us  to  concentrate  on  the 
class  of  continuous,  linear  functionals  on  M  where  M  is 
endowed  with  weak  star  topology  induced  by  continuous 
functions  (with  respect  to  pp  or  the  product  topology). 
Symbolically  we  are  interested  in  linear  functionals 
T\M— »R  of  the  following  form 

Tv(|i)  =  J  y  (x)  p  (dx)  xeR“,|i£M  (11) 

where  y:R“->R,,y  is  continuous  and  bounded  on  R". 
It’s  trivai  to  see  that  for  a  which  is  continuous  (coordi¬ 
natewise)  and  bounded,  its  restriction  to  R"  induces  con¬ 
tinuous  functionals  on  Mn.  It’s  possible  to  have  y  bounded 
and  continuous  (coordinatewise)  which  are  not  continuous 
in  product  topology  on  R“.  An  example: 

y(x)  =  max  min  ( I  Xj  1 ,  1) 


t 


I 


> 


> 


► 


Therefore,  Tv((i)  =  J  max  min(  I  x;  I ,  1)  M(dx)  is  not  con¬ 
tinuous  with  respect  to  (l,  even  though  y(x)  I  r>  is  continu¬ 
ous  and  bounded  and  therefore,  Tv  (4 1  R» )  is  continous  on 
Mn. 

Now  adapting  the  general  discussion  above  to  our 
observation  model  in  (1),  we  notice  that  we  are  considering 
the  narrow  class  of  stationary  and  ergodic  members  of  M  as 
described  by  (1).  We  will  assume  that  the  classes  in  (1) 
inherit  the  subspace  topology  from  M. 

II.  Linear  Processes 


Consider  the  following  pair  of  nominal  measures. 


Mo  :  Wn  =  £  ai  W„_i  +  Un 
i=l 

Ml  :Wn=£bi  W^  +  V,, 
i=l 


(12) 


where  (a*) ,  {bjl ;  L>1  are  distinct  sequences  of  real 
numbers  and  { U„ )  and  { Vn }  are  i.i.d.  sequences  of  Gaus¬ 
sian  random  variables,  with  identical  variance  a2 .  Let 

EU„  =  nio;HVn  =  m1  . 

Note  that  the  above  class  is  general  enough  to  contain 
ARMA  processes. 


Define  A(wi-1)  =  a  1 


B(wf1)  =  a'1 


n— 1 

£  (bj-a;)  w„_j  +  (mi -mo) 
fc=l 

n-1 

£  (bj+a;)  wn_i+(m1+m0)  (13) 
i=l 


Then  the  optimal  stopping  rule  NjJ(w)  is 

N$(w)  =  min  (L§(wJ" )  +  k-1  /  k=l  ,2,3,...) 
(Bansal  ct  al  (1986) 


where 


L§(w)  =  inf  (n  :  £  g?  (w\)>log6) 


i=l 


(14) 


and 


g?(w\)  =  2_1  A  (w‘f 1 ) 


2o-1  wn  -  B  (w'i 


i-l 


Asymptotically  as  the  number  of  observations  increases  we 
have 

g?(w\)  D  g°(wL) 


A(wr')  ^A(wL)  (15) 

B(wr')  D  B(w^)  . 

- * 

Also  the  rate  at  which  a  shift  from  Mo  to  Mi  is  detected  is 
determined  by  (g°  (w2»)). 

A  (w!L)  and  B(w2«)  are  not  continuous  functions  of 
(wq,  w_j  ,...)  £  r2«*.  However,  a  simple  trick  will  make 
them  continuous  and  bounded  at  the  same  time.  If  we 
replace  (Wj)  by  say  (w[)  where  w[  are  y"<fnrm!y 
bounded,  then  under  usual  conditions  on  {a*}  and  (bj), 
A(w^.)  and  B(wi)  will  be  continuous  and  bounded  func¬ 


tions  of  the  entire  one-sided  sequence  (wi_).  And,  there¬ 
fore,  g°(w^L) ,  {n_1  L?  w^1  j^i  will  all  be  continuous  and 
bounded  functions  on  their  respective  demains.  As  a  result, 
the  functional 


T'(m)  =  /  g'(wL)  M  d(wL) 

(16) 

will  be  continuous,  where 

g'(wL)  *  g°(wl) 

(17) 

and  the  new  rate  will  be  determined  by 

t/(mi)  =  e)1i  (g'(wL))  . 

(18) 

A  simple  approach  to  construct  w[  from  w(  is  by 
using  Huber’s  operations  on  the  marginal  loglikelihood 
ratio  which  leads  to  fixed  lower  and  upper  bounds  on 
lnffjfwil/folwi)).  These  bounds  can  be  mapped  back  to  w, 
space  to  obtain  w'.  Some  e'  can  be  used  as  the  design 
parameter.  The  advantage  of  this  approach  is  that  as 
e'  4-  0,  w[  — »  Wj,  that  is  we  return  to  the  ideal  case  and  also 
when  Mi  and  Mo  both  are  i.i.d.,  we  return  to  the  optimal 
robust  operation. 

For  the  breakdown  point  and  the  influence  function 
we  will  use  the  following  definitions  which  are  same  as  the 
ones  used  in  our  previous  studies  Bansal  et  al  (1987  a,b). 

Suppose  Mix*  denotes  the  measure  induced  by  the 
nominal  Mi  and  an  i.i.d.  sequence  of  outliers  occuring  with 
frequency  (probability  Q  and  magnitude  z. 

Our  measure  of  efficiency  and  credibility  both  is  the 
quantity  (g'(w!L,)}  because  (g'(w?_))  determines 
the  asymptotic  rate  with  which  our  algorithm  detects  the 
change.  In  the  absense  of  contamination  E)li  (g'(w!L))  is 
positive  and  E,^  (g'(wfL))  is  negative.  In  the  presence  of 
strong  contaminating  measure  either  or  both  may  reverse 
their  sign  and  then  our  algorithm  becomes  useless.  So  it’s 
of  interest  to  find  the  largest  percentage  of  outliers  (for 
example)  that  our  algorithm  can  withstand.  Formally, 

(Definition)  Breakdown  point  is  the  largest  frequency 
£  of  outliers  such  that  E^  (g'(w!L))  still  retains  its  nomi¬ 
nal  algebraic  sign  for  i=0,l.  Here  z  will  be  chosen  such 
that  it  leads  to  worst  case  or  earliest  breakdown. 


Next,  the  influence  function  measures  the  normalized 
influence  of  a  single  observation  at  a  particular  value  z  on 
the  quantities  of  interest,  which  are  again 
E^  (g'(w!L)} ,  i=0,l.  Formally, 

(Definition)  The  influence  function  IF,,  (z).  is 

. . E^.(g'(wL))-Et,[g'(wL))/im 

IF(J.(z)=  limit - p - (19) 


ID.  An  Example 

Let  Mo  :  U„-aU„_i 

M,  :  Un  -  a  Un_,  +  0  (0>O)  .  (20) 

U„  -  N  (0,1)  and  0  <  a  .  N  (-,-)  rcferes  to  Gaussian  distri¬ 
bution.  Alternatively, 

Mo  :  Wn  =  -  £  a1  W„_i  +  Un 

1 


7 


TOWJWJrjuniirK.v^  vr*  '-■>  it*  WWCT  vvvvvvvvvv 


> 


> 


> 


► 


► 


► 


► 


» 


n,  :Wn=  -£a‘  W*_,  +  Vn 
1 


(21) 


where  Vn  -  N  (0/1 -a,  1). 

Also,  Wn  -  N(0,  l  +  a2)  under  and  Wn  - 
N(0, 1+a2)  under  |i| .  Note  that  po,  m  both  have  infinite 
memory. 


i(w„)  0  _fl 

fo(wn)  ~  l+a2  lWn  2  1 


(22) 


For  a  given  e',  Huber's  operation  transforms  the  likel¬ 
ihood  ratio  as  shown  in  the  picture. 

From  Fig.  1  we  obtain  the  following  description  of  w„ 


w'(w)| 


=  wn  on  [-  d  (l+a2)/0  +  -j,  d(l+a2)/0  +  y 
=  w'm  =  -d(l+a2)/0  +  0/2  on  wn  < -d(l+a2)/0  +  -jj-j 
=  w'm  =  d(l+a2)/0  +  7  on  wn  >  d(  1  +a2  )/0  +  ~- 


Thus 


g'(wl)^g0(wi) 


0 

1-a 


' 

V  V  ® 

Wq  - 

>  d  W  _  T 

T  2(i-a) 

Therefore 


Eft  g'  (wL)  = 


(1-a2) 


Et  V 

a,  w  -  2 


(23) 

(24) 

(25) 


Recalling  that  Hi.  z  denotes  the  measure  induced  by  the 
nominal  4,  and  i.i.d.  sequence  of  outliers  occuring  with  fre¬ 
quency  (probability  £  )  and  magnitude  z,  we  have 

two)  =(1_QEUt  wo  +  Cw'(z) . 

Since  E^  g'(w2«)  >  0,  breakdown  in  (25)  will  occur  due  to 
negative  extreme  outliers,  wheu 


0 


E“|‘-{8'(W-»  =  ^ 


EMiA-two)  -  S  (V26i 


Comments 


(1)  As  e'  i  0,  d(e')  t  w'„  -»  wn 

Eft  g'(wL)  ->  E^,  g°  (wL)  =  -9—r  ■ 

(1-a) 

(2)  As  e'  T  (to  it’s  maximum  allowable  value),  which  may 
be  less  than  1/2, 


Y  >  20(0/2)-! 

2<D(|) 


(30) 


And  then  if  0  t  ■«,  ij  -+  1/2,  the  maximum  achievable 
breakdown  point. 


(3)  If  we  use  the  memoryless  robust  algorithm,  that  is  in 

A  ("l 

(14)  if  we  replace  g;(w\)  by  In—  (wn)lw>  =  w-  then  again 

fo 

we  have  same  breakdown  point.  But  our  algorithm  is  more 
efficient  because 


[  1 
•j  In  ^  (wn )  1  Wn  -  w^j 

0 

Eft  wn  "  2 

(l+a2)  , 

<  e 

c  ,,  6 

d-a)2 

fcft  Wn  2 

=  Eh,  {g'(wD)  (31) 

and  1-t-a2  >  (1-a)2  =  l+a2  -  2a  for  a  >  0.  However  if  a  <  0, 
then  one  should  use  the  memoryless  algorithm  for  higher 
efficiency  for  the  same  breakdown  point. 

The  influence  function 


From  the  definition  in  ( 19), 

0 


IF^fz)  =  limit 


S-o  (1-a) 


(1-0  E„  Wo  +  C  Wq(Z)  -  - 


Em  wo-y 


/c 


=  u^(w°(z)_EviiW°)  • 


or(l-^)E(ll  wq  +  £,w'(-o°)~  —  £0 


or  1^0  =  1/ 


1+ 


(l+a2) 

0 


Eft  Wq  -  y)  d(E')  1 


Similarly  breakdown  can  occur  when 

Eft*.,  (-g'(wL))  >0 
which  gives  the  breakdown  at 


or= «/ 


1  +  d(l+a2)/0(—  -  E^  wq ) 


Therefore,  the  overall  breakdown  point 
1=  min  (0.0 


■(27) 


(28) 


(29) 


Since  Iwq(z)I  Smaxllw^l,  w^)  and  wj(z)  is  continu¬ 
ous  function  of  zJF^  (z)  is  continuous  and  bounded. 

Because  of  space  limitations,  we  can  not  present 
another  example,  the  numerical  results  and  a  comparison  of 
the  suggested  algorithm  with  the  one  studied  for  Markovian 
situation  as  it  would  be  meaningful  to  apply  the  suggested 
algorithm  to  Markov  processes  themselves. 


IV.  Conclusion 

We  have  discussed  the  issue  of  robustness  in  time  series 
from  an  abstract  point  of  view  and  pointed  out  the  general 
failure  of  operations  designed  to  be  robust  under  i.i.d.  and 
Markovian  set  up.  A  stronger  notion  of  continuity  of  the 
ooint  function  of  t)/(x)  was  needed  *•>  achieve  robustness  of 
operations  which  make  use  of  entire  distribution  of  the  pro¬ 
cess.  A  simple  technique  based  on  Huber’s  approach  was 


used  to  obtain  the  pseudo  observations  which  replaced  the 
true  ones.  One  could  benefit  by  using  higher  order  densi¬ 
ties  to  achieve  the  same  goal.  This  of  course  needs  further 
study  as  one  can  numerically  (if  not  analytically)  optimize 
with  respect  to  the  order  itself. 

Another  approach  for  designing  robust  operations 
would  be  to  put  a  bound  on  breakdown  point  and  maximize 
the  efficiency  or  vice  versa. 

Ideally  one  would  like  to  extend  the  minimax  optimal¬ 
ity  results  under  i.i.d.  set  up  to  non  i.i.d.  set  up.  This  seems 
to  be  a  formidable  task.  One  should  perhaps  start  from 
Markovian  set  up  first. 
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degrees.  In  addition,  courses  in  the  humanities  are  offered  within  the  School. 

The  University  of  Virginia  (which  includes  approximately  2,000  faculty  and  a  total  of  full-time 
student  enrollment  of  about  1 6,400),  also  offers  professional  degrees  under  the  schools  of  Architecture, 
Law,  Medicine,  Nursing,  Commerce,  Business  Administration,  and  Education.  In  addition,  the  College 
of  Arts  and  Sciences  houses  departments  of  Mathematics,  Physics,  Chemistry  and  others  relevant 
to  the  engineering  research  program.  The  School  of  Engineering  and  Applied  Science  is  an  integral 
part  of  this  University  community  which  provides  opportunities  for  interdisciplinary  work  in  pursuit 
of  the  basic  goals  of  education,  research,  and  public  service. 
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